Convolution: A Practical Review

Login

About Ric Losada

Ricardo (Ric) Losada is a software development manager in the Signal Processing & Communications group at MathWorks. Ric leads the DSP & Audio team that develops the DSP System Toolbox and the Audio Toolbox. Ric's expertise includes single- and multi-rate filtering, spectral analysis and statistical signal processing/adaptive filtering. Ric has over 20 years of experience in developing high performance simulation and analysis software for signal processing applications.

Ric is a contributing author to two chapters of the book “Streamlining Digital Signal Processing: A Tricks of the Trade Guidebook, Second Edition” edited by Richard G. Lyons, published By Wiley & IEEE Press (2012).

About MathWorks

The MATLAB and Simulink product families are fundamental applied math and computational tools at the world's educational institutions. Adopted by more than 6,500 universities and colleges, MathWorks products accelerate the pace of learning, teaching, and research in engineering and science. MathWorks products also help prepare students for careers in industry worldwide, where the tools are widely used for data analysis, mathematical modeling, and algorithm development in collaborative research and new product development. Application areas include data analytics, mechatronics, communication systems, image processing, computational finance, and computational biology.

Slides

Click Here to Download Slides (PDF)

Home > On-Demand Archives > Talks >

Ric Losada - MathWorks - Watch Now - DSP Online Conference 2021 - Duration: 01:07:01

Abstract Read Before Watching Questions & Comments (4)

This session focuses on the practical aspects of convolution and FIR filtering and some of its applications, from algorithm exploration through to real-time signal processing.

We discuss the tradeoffs between time and frequency domain implementations, including latency and computational cost. For both domains we review the best-known implementation variants and some of the hardware technologies most used for performance acceleration.

This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.

What this presentation is about and why it matters

This talk is a practical tour of convolution and finite-impulse-response (FIR) filtering, with emphasis on how to implement long filters efficiently for real-time systems. It compares classical time-domain sample-by-sample implementations (direct and transposed forms), partially-serialized and SIMD-accelerated implementations, and frequency-domain block methods based on the FFT (overlap-add and partitioned convolution). The speaker explains when each approach wins or loses in practice, and why design choices (for example, reducing filter length with IFIR or using polyphase/downsampling) can be as important as low-level optimizations.

Who will benefit the most from this presentation

This presentation is aimed at practicing engineers and students who implement real-time DSP systems: audio and spatial-audio developers, embedded and FPGA/ASIC designers, and anyone responsible for high-performance filtering in software or hardware. If you work on headphone binaural rendering, room acoustics, echo cancellation, or any application where filters are long or latency matters, you will get practical guidance and tradeoffs you can immediately test.

What you need to know

Before watching, a few core concepts will make the talk easier to follow:

Convolution basics: Linear convolution for discrete signals is the weighted sum of past inputs by the impulse response. In notation: y[n] = sum_{k=-infty}^{infty} x[k] h[n-k]. With a finite impulse response (length L) the sum is finite and implementable.
FIR structure choices: Direct-form (tap-delay line) and transposed structures implement the same transfer function, but differ in how state is stored, transient behavior, pipeline friendliness, and how easily they vectorize. For transient/initial-condition differences the state-space view explains why they can produce different outputs when coefficients change at runtime.
Complexity versus latency: Time-domain cost grows roughly as O(L) per output sample (L multiplies), while FFT-based block convolution trades compute for block buffering. For block length M and filter length L, frequency-domain linear convolution via the FFT requires choosing an FFT size N satisfying N \ge L + M - 1. FFT approaches become computationally attractive as L and M grow, but they introduce buffering delay.
Optimizations and hardware: SIMD (AVX, NEON, etc.), FMA units, circular buffers, and coarse- and fine-grain parallelism change the practical crossover point between time-domain and frequency-domain methods. Compiler auto-vectorization or hand-tuned intrinsics can make naive time-domain code surprisingly fast.
Filter-design level savings: Techniques such as interpolated FIR (IFIR) and polyphase/downsampling can reduce the number of nonzero multiplies needed, often before any implementation-level optimization is attempted. Designing for multirate processing changes where the cost is paid.
Partitioned convolution: Long impulse responses can be split into partitions so that short input blocks reduce latency while using FFTs efficiently on each partition. This is essential in audio/binaural applications where perceived latency matters.

Glossary

Convolution — The discrete-time operation that blends input samples with an impulse response to produce output samples; the mathematical core of FIR filtering.
FIR (Finite Impulse Response) — A nonrecursive filter whose impulse response has finite length L and which is implemented via convolution.
Impulse Response (h[n]) — The filter coefficients; the time-domain pattern that defines how an FIR filter responds to an impulse.
Direct form — The straightforward tap-delay line FIR implementation where past inputs are multiplied by coefficients and summed.
Transposed form — An alternative FIR layout with delays between adders; often more pipeline- and SIMD-friendly and better for coarse-grain parallelism.
Overlap-add — An FFT-based block convolution method that breaks input into blocks, convolves each block with the full impulse response in the frequency domain, then overlaps and adds the outputs.
Partitioned convolution — Splitting a long impulse response into smaller blocks (partitions) to reduce latency while using FFT acceleration on each partition.
IFIR (Interpolated FIR) — A design technique that inserts zeros in a short prototype filter to achieve a narrow transition band and uses a separate image suppressor—reduces nonzero coefficients.
Polyphase / Noble identities — Multirate identities that enable efficient combination of downsampling/upsampling and filtering for computational savings.
FFT (Fast Fourier Transform) — An algorithm to compute discrete Fourier transforms efficiently; used to turn convolution into pointwise multiplication in frequency-domain methods.

Final note

This is a practical, no-nonsense talk: it pairs clear explanation of classical algorithms with measured benchmarks and real-world considerations (latency, SIMD, compiler behavior, multithreading, and filter design choices). Expect actionable rules of thumb rather than hard dogma: the best method depends on filter length, block size, hardware features, and whether you can redesign the filter. If you implement or optimize FIRs for audio, embedded systems, or spatial-audio, you’ll come away with techniques you can try immediately and a better sense of which trade-offs matter.

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

fred_h

Score: 0 | 4 years ago | no reply

one very effective option... do the 5-to- down sampling in polyphase and then 1-to-5 interpolator to keep same input sample rate but have all operations at reduced sample rate.... always beats the IFIR filter which raises the sample rate and uses wider registers to hold the zero packed data.

fred_h

Score: 0 | 4 years ago | no reply

the longest room acoustic impulse response i ever heard was a cathedral in England... they wanted to show how the echos significantly altered how a modern piece of music would sound in a long echo environment...i think it was a matlab based demo....

Darkphibre

Score: 0 | 4 years ago | no reply

On circular buffers: I believe you can get rid of the if statement with a modulus operator to recenter back to start if the memory is aligned, but ymmv whether branch mispredictions vs. division is cheaper.

Darkphibre

Score: 0 | 4 years ago | no reply

On slide SIMD Implementation: When doing SIMD, doesn't each register still rely on the prior results (the other registers) to finish? Did we shift gears away from the partial-parallel you were talking about for ASICs?